Abstract: Maintaining large amount of data at one system takes long time to access, to reduce this problem use replicas of that data at large number of disks. It reduces our time to access that data. The data in healthcare is increasing rapidly and is expected to increase significantly in coming years. This has different kinds of health data like EHR, genomic, behavioral and public health which can be processed with big data processing. Although many technologies are used for big processing with health care records, which uses predictive analysis however which is not sufficient for all kind of health records. In this paper Fuzzy C-means Clustering Algorithm and ID3 (Iterative Dichotomiser 3) Classification Algorithm which creates centroid-based clustering and builds a decision tree from a fixed set of examples. That helps to maintain all kind of health data with low costs, and Provides right intervention to the right patient at the right time. It’s potentially beneficial for all the components of a healthcare system like provider, payer, patient, and management. This includes that the health care data should be properly analyzed so that which group or gender are attack by the diseases most.
Keywords: Healthcare, Hadoop, big data, clustering, analytics.